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Abstract 

In this paper, we prove the following singular value inequality: Let R™ 1 *™ 2 be the set of all 
niX?i2 real matrices, n = min{?ii,?i2} and crj(-) denotes the i-th largest singular value, then for any 

P e (0,1), A,B e R" lX ™ 2 , 



^af(A-B)> Y^\^)- of (B) 

»=1 i=l 

This resolves in the affirmative a conjecture by Oymak et al. Q]. We also give applications of 
this inequality to the low rank matrix recovery problem. We introduce a new notion of restricted 
isometry property of matrices and then use it to analyse the non-convex Schatten-p quasi-norm 
minimization program. Finally, we prove a sufficient condition based on our new matrix RIP for the 
program to recover the desired low rank matrix and give a probabilistic guarantee for the condition. 

1 Introduction 

Low-rank matrix recovery, with its applications to image compression, low-dimensional graph realiza- 
tion, machine learning, etc, has been the focus of much research recently. Although such problems 
can be tackled by tailor-made algorithms, the general rank minimization problem is NP-hard, as it 
contains vector cardinality minimization as a special case. A popular heuristic is to use the nuclear 
norm minimization program instead. Fazel, Parrilo and Recht [6] adopt this approach and generalized 
the idea of compressed sensing by establishing a one-to-one correspondence between almost every sin- 
gle notion in sparse vector recovery and low-rank matrix recovery. In particular, they give a matrix 
version definition of the restricted isometry property and derive recovery conditions based on their RIP. 
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Recently, Oymak et al pQ make use of a standard result [1] from linear algebra to show that certain 
classes of sparse vector recovery results can be easily translated into a low-rank matrix recovery result 
in a systematic way, demonstrating an even more transparent relation between sparse vector recovery 
and low-rank matrix recovery problem. Another popular heuristic for low-rank matrix recovery is to 
use the Schatten-p quasi-norm minimization program. Such approach is empirically shown to give 
better result than the nuclear norm minimization [15], |16j . 

Our contribution in this paper is twofold. First, we prove the singular value inequality proposed by 
Oymak et al in [T]: Let R niXn2 be the set of all n\xri2 real matrices n = min{?ii,n2} and o"i(-) denotes 
the i-th largest singular value, then for any p £ (0, 1), A,B £ M niXn2 , 

n n 

J2°f(A-B)>^\o»(A)-of(B)\ (1) 

i=l i=l 

Our proof of this inequality is inspired by Fiedler's paper [2], in which the author established upper 
and lower bounds for the determinant of the sum of two Hermitian matrices. However, some new 
ideas are needed. In particular, using a standard technique in matrix analysis, we first show that 
dU) holds as long as it holds for symmetric matrices. Then, we introduce the notion of eigen-sorting 
permutation along with a useful lemma (see lemma 2) to address the order issues of singular values 
when they are matrix-perturbed. Second, we give some applications of this inequality to compressed 
sensing. We introduce a new notion of restricted isometry property for low-rank matrix recovery, and 
derive an RIP-based sufficient condition on the measurement operator so that a low rank matrix can be 
recovered by the Schatten-p quasi-norm minimization program. We also show that such condition will 
be satisfied with overwhelmingly high probability when certain family of random linear measurement 
operators is used. 



2 The Conjectured Singular Value Inequality 

The main result of this section is to prove the singular value inequality ([T]). We use a standard trick in 
matrix analysis and operator theory to show that the conjecture is true for general rectangular matrices 
provided that it is true for symmetric matrices in subsection 2.1. Then we prove that the conjecture is 
indeed true for symmetric matrices in subsection 2.2. Finally, by applying the similar techniques, we 
also prove another recent conjecture proposed by Miao [5], which is a generalized version of (pQ). 

2.1 Proof of the singular value inequality for rectangular matrices 

The purpose of this subsection is to explain how to use the matrix dilation to extend the inequality 
to rectangular matrices if it is true for symmetric matrices. Consider matrices Z £ ]R niXn2 , assume 
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without loss of generality that n\ < n-2, with singular value decomposition Z = C/[Zo 0]y T , where 
Z = diag(oi(Z), . . . ,a ni (Z)). Define the linear operator E : R n ^ xn 2 _). 5 ni+n2 (M) by 



z 
Z T 



(2) 



Then we have the following result, see |17j . 
Lemma 1. Define the orthogonal matrix Q G 5 ni+n2 (M) by 

o- — ( u u ° 

where V = [V 1 V 2 } with V 1 G W n2Xni and G R*wx(«a-»i). Then 

% 0\ 
H(Z) = Q | -Z 0\Q T 
0/ 



(3) 



(4) 



Observe that the eigenvalues of are ±o~i(Z) and of multiplicity 712 — ni, and hence we have 

fjj(H(Z)) = <7 (^) for i = 1, . . . , 2ni, and otherwise. 



Theorem 1. Suppose the inequality ([T]) is true for symmetric matrices. Let A, B G R n 
min{ni,n2} and p G (0,1). Then, 

n n 

Y,°»(A-B)>Y,\*!(A)-of(B) 

1=1 i=l 

Proof: Assume without loss of generality that n\ < n%. By the supposition, we have 

2«i 2ni 



ixri2 



n 



(5) 



£ af (E(A - B)) > £V(H(A)) " <^ B )) 



i=l 

Then by the above observation, 

2m 



(6) 



i=i 



2ni 



i=l 



i=l 
»i 



i=l 



(7) 
(8) 



□ 
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2.2 Proof of the singular value inequality for symmetric matrices 



For any X E <S n (K), nxn real symmetric matrices, it is easy to see that the singular values of X are 
|Ai(X)|, IA2POI , • • • , |A re (X)|, where Aj(-) denotes the i-ih largest eigenvalues. However, |Aj(X)| / 
(Ji(X) in general since some of the eigenvalues may be negative but the order of the singular values 
depends only on the magnitudes of the eigenvalues. We say that IT* = (r^) _1 is an eig en- sorting 



X nX (X) 



Ui{X) Vi = 1,2, 



...n. 



Note that eigen-sorting permutation is not 



tiplicities of eigenvalues, or because negative and positive eigenvalues may have 



permutation of X if 
unique due to the mu 

the same magnitude. However, since for any two different eigen-sorting permutation and (n )' of 
X, X n x(X) = \ (u xy(X)\ for i = 1, . . . , n, inequality (HJ can be equivalently written as 



J2\x u a-b(A-B) 



> 



i=l 



i=l 



^r-iAn^p 



0) 



Lemma 2. Let M E <S ra (M) be non-singular and N E be sufficiently small, then M + N is 

also non-singular and for any eigen-sorting permutation JJ M + N D f M + N there exists an eigen-sorting 
permutation U M of M such that U M = U M+N . 



Proof: Suppose there are / distinct singular values of M, taking values c\ > ■ ■ ■ > o\ > 0. Define 
If = {i : |Aj(M)| = at}, for t = 1, . . . , I. Then any permutation n = (T) -1 is an eigen-sorting permu- 
tation of M if and only if r(Xf) is a set of consecutive integers for any t = 1, . . . ,n and Ti l < • • • < Tj. 
for any i\ E Zi, . . . , i\ E Zi. 



Let d(M) = min (<jj — 0"j+i) be the minimum gap between distinct singular values and between 

i=l,...,l 

singular values and zero, where 07+1 = 0. Let N be sufficiently small so that cri(N) < |. Then by the 
corollary 4.9 of |19j . 

Xi(M + N) E [A i (M) + A n (JV),A i (M) + Ai(JV)] 
C [UM) - (7i (JV), Xi(M) + ffl (JV)] 

C (x i (M)-±Xi(M) + ^) 

By the definition of d, ^ ^Aj(M) — — , Aj(M) + — ^ for any i = 1, . . . , n, so M + N is non-singular. 

Also, by the triangle inequality we have that for each i = 1, . . . ,n, ||A»(M + N)\ — |Aj(M)|| < |, and 
this implies for each t = 1, . . . , I, 



o t (M) - ^ < \Xi(M + N)\< a t (M) + ^ Vi E If 



(10) 
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Choosing any eigen-sorting permutation Jl M+N Q f M + N, since (<7j — o"j + i) > d, then for any i\ G 
ai--< a r M+ N (M + N)<ai + -<■■■< ai--< o v m+n(M + N) < a x + - (11) 



and hence r£f +JV < ••• < rf +Jv . Now suppose ji,j 3 € T M+N (Z t ) for some t = 1, . . . , I and j 2 & 



M+N i 



Y M+N (Z t ) is another integer such that j\ < ji < js, and therefore aj 3 (M) < aj 2 (M) < aj 1 (M). If 
J2 G 2t' for some t' > t (the case of t' < t is similar), then by (11) 

a j2 (M + N) < a j3 (M + N) (12) 

which is a contradiction, and hence T M+N (It) is a set of consecutive integers for any t = 1, . . . , I. □ 

Since now A, B are symmetric, they can be diagonalized by orthogonal matrices and all the eigen- 
values are real. Rewrite A = UiAoUf,B = UiBqJJ^ \ where Aq = diag(X\ (A), . . . , X n (A)), Bo = 
diag(\\(B), . . . , \ n (B)). Let V = Ufll2 and O be the set of all orthogonal matrices. Then we have 



5>f(,4 - B) = Y,^(Ao ~ VB V T ) > MjT,o>{Ao - UB U 



i=l 



i=l 



(13) 



i=l 



It is well known that O is compact and therefore there is a minimizer Vq G O achieving the above 
infimum, thus 



J2 — B) > min ^ of (A - UB U T ) = £ af(A - V B Vj 



i=l 



(14) 



i=i 



Now we are going to prove two theorems which will be used to prove our main theorem later. One 
of them is about the matrix perturbation of the Schatten-p quasi- norm, p G (0, 1), and the other one 
reveals an important property of the minimizer Vo- Before proving them, we need to introduce a new 
function. Let M G 5 n (M) be non-singular. Then, for some orthogonal matrix W, 



M = W 



Ai(M) 



V 



W T = w 



A„(M). 



/sgn(Ai (M))ct t m(M) 



\ 



W 1 



sgn(A n (M)Km(M), 



where sgn(-) is the sign function and T M = {I\. M ) 1 . Define S p to be the function on the set of 
non-singular nxn real symmetric matrices given by 



S P (M) = W 



si 



\ 



W 1 



(15) 



n) 
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where Si = sgn(A i (M)) ( £ M 1 (M). 



Theorem 2. Let p £ (0, 1) and M,N £ S n (R) with M being non-singular. Suppose Ti(NS p (M)) < 
(resp. Tr (NSp(M)) > 0). Then for sufficiently small e > and x £ (0, e) 



£ of (M + xiV) < ^ of (M) (resp. £ of (M + *JV) > £ "? W 



■i=i 



i=l 



(16) 



i=l 



i=l 



Proof: Let M G S n (R) be with eigen-decomposition M = WdiagfaiM), . . . , A„(M))VK T , be the 
normalized eigenvector corresponding to Aj(M). By lemma 2 and the positive homogeneity of singular 
values, we can pick a sufficiently small e > so that for any x E (0, e), sgn(Aj (M+xiV)) = sgn(Aj(M)) ^ 
Vi = 1, 2, . . . , n, and for any eigen-sorting permutation Y M+xN of M + xN there exists an eigen- 
sorting permutation T M of M such that Y M+xN = T M . Suppose for now that the eigenvalues of M 
are distinct. Then, by |X9|, p. 185, Theorem 2.3], we have 



Aj(M + xN) = Xi(M) + xujNm + 0(e z ) Mi = 1, 2, . . . , n, 



[17) 



where the 0(e 2 ) term depends only on N but not on M (see |19t p. 185, Theorem 2.3] for details). 
Now, using (fT7]) . we have 

n n n 

^a P {M + xN) = ^ \\ u m+*n(M + xN) P = ^[sgn(A n M(M + xiV))(A n M(M + xAO) 

i=l i=l i=l 

n 

= Y [sgn(A n M (M)) (A n M (M) + xii^ATitnM + <3(e 2 )) 
i=i 

n 

[(Ti(M) + sgn(X n M{M))xu^MNu U M + 0(e 2 ) 



i=i 



Here we remark that although the last 0(e 2 ) term is the product of a sign function and the 0(e 2 ) term 
in the second line, it is still independent of magnitudes of the eigenvalues of M. 

By the generalized binomial theorem (see [3] for details), 

o~i(M) + sgn(X U M(M))xi£ M Nu U M + 0(e 2 ) 

= of (M) +pa p r 1 {M) sgn(X uM (M))xul M Nu U M + V C p k af~ k (M) \ sgn(X U M(M))xul M Nu U M 



k>2 



C p k [o-i(M) + sgn(X n M(M))xul r Nu n uY * [(9(e 2 )] 



(18) 



k>l 



I. 



where C£ is the generalized binomial coefficient. Since x = 0(e), we have 

a i (M) + sgn(\ U M(M))xuI lM Nu U M+0(e 2 )] P = a\ \M) + paf' 1 (M) sgn(A n M (M))x?£ M iVu n M + 0(e 2 ) 

(19) 

Thus, 

n n n 

^af(M + xiV) = ^(rf(Af) +px^af-\M)sgn(A n M(M))n^ M A r n ri M +0(e 2 ) 

7 = 1 7 = 1 7 = 1 

71 71 

= °i(M) + P xY, <&(M) sgn(A j (M))«JiVu j + 0(e 2 ) 

7=1 j=l 3 

71 n 

= o?(M) + p.x E V'J- v ''j + 0(e 2 ) 

7=1 J=l 
71 71 

= E a i( M ) +P*Y1 u]NS p {M) Uj + 0(e 2 ) 

7 = 1 j = l 

71 

7 = 1 
71 

i=l 

where the last inequality follows from the supposition that Tr(NS p (M)) < 0. 

To handle the case where M has repeated eigenvalues, we pick a sequence {M 3 } of real symmetric 
matrices with distinct eigenvalues such that M 3 — > M. Since M is non-singular, by lemma 2, we 
have that for each j = 1,2,... , M 3 is non-singular, H MJ = H M and sgn(Aj(M)) = sgn(Aj(M- 7 )) for 
all i = 1, . . . , n. By adjusting e > if necessary, we can further ensure that for any j = 1,2, ... , 
i = 1, . . . , n and x e (0, e), a^M 3 ) + sgn(A MJ {M 3 ))xu T p Nu uM j is uniformly (in j) bounded below 



by a positive constant. Hence, by |18l lemma 4.3], 



0(e 2 



k>2 



and 



p— fc 



0(e 2 



0(e 2 ) 



fe>i 



as j — >• oo. Hence we, again, get (|T9j) and this completes the proof. 



(20) 

(21) 
□ 
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Theorem 3. Define B\ = Vo-Bo^o T and C$ = Aq — B\. Then B\ and Co commute if Co is non-singular. 

Proof: It is well-known that for any two matrices in 5 n (R) commutativity is equivalent to co- 
diagonalizability, see [3). Thus 

B\ and Co commute £././. B\ and Co are co-diagonalizable 

«././. B\ and 5 p (Co) are co-diagonalizable 
i.f.f. B\ and 5 p (Co) commute 

since the function S p (-) does not change the diagonalizing orthogonal matrix but only the eigenvalues. 
Hence we need to show that S p {Cq)B\ = BiS p (Cq). Define D — S p {Cq)B\ — B\S p (Cq) 7^ 0. Observe 
that 

D T = (S p (C )Bi - B^iCo)) 7 * 
= BfS p (Co) T — S p (Cq) t bJ 
= BiS p (C ) — S p (C )Bi 
= B\S p (Co) — S p {Cq)Bi 
= -D 

D is skew-symmetric and hence V(e) = exp(eD) £ O Ve £ R. Let e be defined as in the proof of 
theorem 2. Using the asymptotics argument similar to that in the proof of theorem 2, we have 

n n 

£ a P { A - V(e)B 1 V(e) T ) = £ of (A - (/ + + • • • )B 1 {I + + • • • f) 

i=l i=l 

n 

= £ o?(Ao - (I + cDJfliCI - eD)) + 0(6 2 ) 

i=l 
n 

= £ of (A ~B l+ e{B\D - DB X )) + 0(e 2 ) 
i=i 

Since 

Tr((BiD - £>Bi)S p (C )) = ^(^^(Cq)) - Tr(DB 1 S p (C )) 

= It(D5p(C )Bi) - Tr(D5i5 p (C )) 
= Tr(D(5 p (C )Bi - Bi5 p (C ))) = Tr(D 2 ) 
= Tr(D(— D T )) = - < 

and by theorem 2, 

n n 

X; *?(4> - V(e)B 1 V(e) T ) < £ of (A - 5i) (22) 

i=l i=l 
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which is a contradiction, and thus S p (Co)Bi = B\S p (Cq). 



□ 



With the help of the following lemma, we could get some insight of the minimizer Vq. 

Lemma 3. Let X,Y S 5 n (M). Suppose that X is diagonal with distinct diagonal entries and com- 
mutes with Y. i.e. XY = YX . Then Y is also diagonal. 

Proof: Let X = (xij), Y = (yij). Then 

(XY)ij = (YX)ij 

n n 

^ X ikVkj = ^2 yilXl 3 

k=l 1=1 

and hence yij = 0, if i 7^ j. Thus Y is diagonal. □ 



From theorem 3, we know that CqBi = B\Cq. By the definition of Co, we have AqB\ = B\Aq. If we 
assume that the eigenvalues of A are all simple, the diagonal entries of Aq are all distinct. Then by the 
above lemma, B\ = VqBqVq is also diagonal. This implies B\ = diag(Xu 1 (B), . . . , An„(-B)) for some 
permutation II. Geometrically, this is saying that in order to attain the minimum in (|14p it is necessary 
for A and B to have the same set of principal axes. Before we prove the main result of this article, 
we need two more lemmas. We will skip the proof of lemma 4 as the idea of the proof is straightforward. 

Lemma 4. Let u, v S M n be two non-negative (entry- wise) vectors such that m > u% > ■ ■ ■ > u n > 0, 
vi > V2 > • • • > v n > 0, and P be any nxn permutation matrix. Then \\u — v\\ 1 < \\u — Pv\\ 1 . 

Lemma 5. Let A be a subspace and B be a compact subset of the same Hilbert space 7~L. Suppose 
g : A x B — > R is bounded below and continuous on A x B. Then g(a) = min g(a, b) is continuous. 

Proof: For any given oq £ A and e > 0, let g(ao) = g and 60 be a minimizer achieving g. By the 
continuity of g, there exists a neighbourhood M ao Q A about ao such that g(a, b$) < g + e Va G M ao - 
Hence g{a) < g + e. 

For the other direction, again by the continuity of g, for any b E B there exists a neighbourhood 
■A/(oo,6) — *4 x £> about (ao,b) such that 

g(a',b')>g(a ,b)-e>g-e V(a' ,b') e Af^y (23) 

Let {WxV} be a base of AxB. By definition, for each N( a0} b) there exists a base element UbxVb Q N( ao ,b) 
such that ao £ and 6 G V&. Observe that {VfejbgH forms an open cover of B. Since B is compact, 
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there exists a finite set of points B' C B such that {Vb'}b'&B' is a finite subcover of B. Now, let 
•^a ~ ^i>'- Clearly, N' ao C „4 is an open set containing ao- Moreover, we have from ([23]) that 
b'eB' 

g(a' , b) > g — e for all a' G jV^ and b & B. It follows that g(a') > g — e for all a' S A/^ , as desired. 



Theorem 4. Let A, B 6 S n (R) and p £ (0, 1). Then 

n n 



(24) 



i=l 



i=l 



Proof: We first assume that the eigenvalues of A are all distinct and Co is non-singular. Then 



By lemma 4, 



Hence 



E -B)>J2 *f (4> ~ V B V T ) 

i=l i=l 
n 

= £ - diag(A ni • • • , A n „ (5))) 

i=l 
n 

^(A^-An^r 

i=i 

n 

>E|iA^)r-iA ni (B)r 
i=i 

n n 

E ]|A,(^)r - |An(fl)|f | > £ ||A n .(A)r - |A nf 

i=l i=l 



E<(^- B )>EK(^)-^) 

i=l i=l 

Next we define F : S n (R) x — )• R to be the function given by 



F(A,U)=Y i o»(A -UB U T ) 

i=l 

to be the function given by 

F(A) = min F(A, U) = min V af(A - UB U T ) 



and F : S n (R) 



(25) 



(26) 



(27) 



(28) 



(29) 



u&o- 



(30) 



i=l 
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It is easy to see that (A,U) — > (Aq — UBqU t ) is continuous with respect to the product topology, 
not only separately continuous in each variable, and since both singular values and power function are 
continuous, F is continuous on the product space S n (WL)xO. By lemma 5, F is continuous in A. 

For the case that Cq is singular and/or A has repeated eigenvalues, pick a sequence {A 3 } ( ^ =1 of real 

symmetric matrices with distinct eigenvalues such that A 3 — > A as j — > oo, and Cq is non-singular 
for all j. Then by (|25p and (|26|) . we have for each j = 1, 2, . . . 



J2^-B) > -£vf(Ai-VjB vt) 

i=l i=l 
n 

> Y J \\ x i( AJ )\ p -\ x u^ B )\ 1 

* I i 

i=l 
n 

> e|iv w-iA nf (£)r , (3i) 
i=i 

where the last inequality follows from lemma 4. Since the inequality f)31 1) depends continuously in A 3 , 
and each Ii A3 can be chosen to equal Yi A by lemma 2, taking limit j — > oo, we have 

n n 

J2af(A-B)>Y,\°!(A)-af(B)\ (32) 

i=l i=l 

and thus complete the proof. □ 

Combining the results of subsections 2.1 and 2.2, we completely resolves in the affirmative the conjec- 
ture ([I]). 

2.3 A generalization of the singular value inequality 

As a side product to the study of the above singular value inequality, we proved that a more general 
version of the theorem 1 holds, and therefore give a partial answer to the question recently raised by 
Miao, see conjecture 7 in [5]. 

Theorem 5. Let A,Be R niXn2 , n = min{ni,n2} and / : R + — > K+ be a concave continuously 
differentiable function with /(0) = 0. Then 



£ f(cri(A - B)) > JJ/MA)) - f{<Ti{B)) (33) 



i=i i=i 
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We omit the proof of theorem 5 because the arguments and tricks are almost the same as those for 
proving theorem 1. Some appropriate modifications are need though. First, instead of using the 
generalized binomial theorem, we expand / by Taylor's expansion to handle the second order term 
0(e 2 ) and therefore we need / to be first-order differ entiable. Second, we need to extend the definition 
of the function S p so that the new definition is compatible with the argument in theorem 2. Under the 
same context as in the definition of S p , define A f to be the function given by 



A f (M) = W 



si 



W 1 



(34) 



V 



where Si = sgn(Xi(M)) f (a r M (M)) . For theorem 4, f(x) = \x\ p and Aj = pS p . Finally, it is easy to 
see that for any concave function / : M + — > IR + with /(0) = 0, / is non-decreasing and subadditive, 
i.e. f(x + y) < f(x) + f{y)\/x,y > 0. Thus, we can use the following lemma to achieve the inequality 
similar to (|26j) . The proof of it can be found in [5] . 



Lemma 6. Let u = (u\, . . . ,u n ),v = (v\, . . . ,v n ) G C n . Let, component-wise, x = |u| 
z = |u — v|. For any non-negative, monotonically increasing, subadditive function / 



and 



J2M)>J2\f( X tr)-f(yf)\ Vfc = l,2, 



, n 



(35) 



i=i 



i=l 



3 Applications 

The goal of this section is to establish conditions sufficient for recovering a low rank matrix using the 
following program (P g ): 

minimize ||-X"|| 
subject to A(X) = y 

where X G iR n i xn 2 j s the decision variable, the linear map A : ]R niXn2 — >■ ]R m and the vector y G M. m are 
given, and || • || denotes the Schatten-q quasi-norm. 
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3.1 A null-space characterization of successful recovery using (P 9 ) 

We first give a simple application of the inequality ([T]) by proving a null-space-based necessary and 
sufficient condition for low rank matrix recovery using program (P 9 ). The result is analogous to the 
theorem 1 in [5] by Wang, Xu and Tang in the way that vector to matrix, and is also analogous to the 
lemma 6 in [10] by Oymak and Hassibi in the way that nuclear norm to Schatten-q quasi-norm. 

For any T C {1,2, ...,n}, v G M n , let vt be the vector such that vx(i) = v(i) Vi G T, and 
vt{i) = otherwise. For any Y G u n i xn 2 with singular value decomposition UY[diag(y) 0]Vy ', let 
Y T = U Y [diag(y T ) 0}V? . 

Theorem 6. Let q G [0, 1], X G IR niXn2 be such that Rank(X ) = K, n = min{ni,n 2 }, and X* be 
the unique solution to the program (P 9 ) with y = AXq. Then X* = Xq if and only if 

K n 
i=l i=K+l 

for any W G Ker(„4) \ {0}, where Kei(A) = {H \ AH = }. 

Proof: Let Xq = Ux {) [diag(xo) 0]V^ Q be the singular value decomposition of Xq and So = { i \ xo(i) / 
} be the support of Xq. Since the singular values are sorted in decreasing order of the magnitudes 
and Rank(Xo) = K, S = {1, 2, . . . , K}. 

K n 

Suppose y^gf (W) < cfO^O f° r an Y W S Ker(.A) \ {0}. Using the theorem 1, 

i=l i=K+l 

+ w) > E i^o) - 

= whx ) - o*(w)\ + E i°f(*o) - °?cwoi 

>E^( x o)-E^w + E^w 

ieSo «es «e5g 

>E^™ 

if n 

Suppose vf(W) > E a K w ) for some W G Ker(^4) \ {0}. Let X = -W So and X = W s -. Then 

i=l i=K"+l 

A(X - X ) = AW = 
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Hence AX = AX . Also 



K 



\\x \\ q q = Y,^w) = ^( w )> E °nw) 



-£af(W) = \\X 



ieSo i=l i=K+l 



Thus Xq is not the unique solution to the program (P g ) with y 



□ 



3.2 The scaled g-restricted isometry property 

In order to analyze the low-rank matrix recovery problem using nuclear norm minimization program, 
Recht, Fazel and Parrilo [6] generalized the idea of the restricted isometry property for sparse vectors 
developed by Candes and Tao [7J and gave a corresponding definition for matrix case. Instead of 
using the convex nuclear norm minimization program, we consider the non-convex program (P 9 ) and 
a different notion of restricted isometry constant for low-rank matrices. 

Let A : M niXn2 i — >• R m be a linear operator, L > be an integer, and < q < 1, r > 0, 8l,t is sa id 
to be the restricted q-isometry constant with respect to the scaling factor r if it is the smallest number 
such that 



for all Z € M niXn2 such that Rank(Z) < L. 

This definition is a natural generalization of the restricted g-isometry constant for sparse vector recov- 
ery problem given by Chartrand and Staneva in [8]. 

3.3 RIP based sufficient recovery condition 

Armed with ([1]) and mimicking the approach used by Chartrand and Staneva in [8], we prove the 
following theorem. Such approach and techniques used in the proof is first developed by Candes and 
Tao [7] and often appear in the compressed sensing and matrix completion literatures [13], p3]. Since 
the sufficient condition is scale invariant we omit the second subscript (scaling factor) of the isometry 
constant in the proof. 



Theorem 7. Let A : M niXn2 i — > R m be a linear operator, Rank(X ) <K,y = AX , <q<l,b>l, 



(1 - S L>r ) \\Z\\ q F < - \\AZ\\ q q < (1 + S L>r ) \\Z 



(37) 



and a 




. Suppose A satisfies 



K 



SaK + b8( a+1 ) K <b-l 
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Then the minimizer of (P 9 ) is exactly Xq. 



Proof: Let X* be a solution to (P 9 ), = X* — Xo. Our aim is to show that H = 0. Let Xo = 
I7x [dio5(xo) 0]V^ and H = UH[diag{h) 0]V£ be the singular value decompositions of Xq and H 
respectively, and Sq = { i \ xq(i) / } be the support of Xq. Again by theorem 1, 



\X \\l> \\X + H\\ q a 



> 



i 
i 

h(iy\ + J2\x°d) q 



E i s o(* 

i&Sb 



Hi) 



iesz 



> E x ^) q - E + E w 

ieSo «65 i€Sg 



#Sg|| 9 < ||-ffsol 



(38) 



Let L = aK. Partition Sq = SiU/S^U- • - USj such that Si contains the indices of the L largest singular 
values of Xq, S2 contains the next L largest, and so on. Note that each Sj contains L elements except 
Sj possibly contains less than L elements. Denote Sqi = SqU S\. Then 

= \\AX* - AXoWl 



i>2 

El!^. 

j>2 



>ll^5, 



q 

01 1 \q 



g 

3\\q 



By the definition of the restricted g-isometry constant of linear operators, we have 

> r(l - 5 L+K ) \\H Sm I |« - r(l + 8 L ) £ | \H S: 1 1 



i I IF 



(39) 



i>2 
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Then we need to bound Ylj>2 I l-^Sj 1 1 p- ^or eacn ^ e ' e 



Hence 



Ell** 

i>2 



fc(fc) < h(l) 

h{k f < J_ | I^UJ 

^■H J < ^IZl-ll^-lllg 



< 



L 1 2 



Ll _ f 



Ollq 



By the Holder's inequality, 



E*« 

ieS 



«eS 



1-2 

x 2 



|9 



Using pH), (j40j) and 01]), we get 



EH^-Hf^ jTzi\\ H s§\\ q q < -jzi\\ H s \\ q q < -^i\\Hs \\ 9 f 

i>2 L L 



a 2 



Let V = a l ~2 > 6. Combining ([39]) and (jUj), we have 

0>r(l-S i+lc )||#s 01 ||!,- 



6' 

r(l + 5 L ) 



\ H S \ 



q _ r(l + S L ) |lrr ,„ 
>r(l-J L+ ^)||fr 5oi |||. 



rfl-<5 



i + <$ 



(<M-l)if 



) \\ H Soi Wf 



(40) 



(41) 



(42) 



Hence H-HsbJIir < if 5 a K + W(o+i)if < b — 1- Thus by (j38j) we conclude that H = 0. 



□ 



4 Probabilistic Analysis 
4.1 Gaussian random linear map 

In this section, we will demonstrate that one can succeed, with overwhelmingly high probability, 
in recovering a low rank matrix using the minimization program (P g ). Our result is based on the 
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following type of random linear measurement maps. Let A : R niXn2 i — >• M m be a linear operator given 
by [,A(X)]fc = Tr(A k X) for all k = 1, ...,m, where ■ are z.z.d random variables having normal 
distribution with mean zero and variance <r 2 for all k = 1, . . . , m, i = 1, ... ,ni and j = 1, ...,n2- 
For any X € R niXn2 , ||j4(X)||' is a sum of m identical and independent copies of the random variable 
| Tr(A k X)\ q . And by direct calculation, the mean is 

imi^2fr ( i±I) 

^^E(\Tr(A k X)\ q ) = = 2 (43) 

Define \i q = /x/ which is independent of 



4.2 Probabilistic guarantee of successful recovery 

The main result of this section is the following theorem. It provides a probabilistic guarantee for the 
successful recovery of a low rank matrix using program (P 9 ). 

Theorem 8. Let A be the random linear operator as mentioned above, q,5 £ (0, 1), K > be an 
integer and r = m/x g . Then there exist constants cq,ci > depending on 5 such that, with probabil- 
ity at least 1 — exp(— cim), the g-isometry constant 5x,r(A) < 5 whenever m > co(ni + ri2)K log(niri2)- 

Before proving the theorem 8, we need several lemmas. 

Lemma 7. Let T Q jj n i xn 2 j-, e a f] xec i subspace of m x ri2 real matrices with dim(7") = d < m. Let 
< q < l,r) > 0,X e T. Then 



(1 - V )m N \\X\\ q F < \\A(XW < (1 + n)mti q \\X\\ q F 



(44) 



with probability exceeding 1 — 2exp(— \^z\ where 



c n < 



1.13 + V? 



r(2±i; 



7T 



(45) 



We will omit the proof of this lemma as the arguments are completely the same as those in the proof 
of lemma 3.2 in [8]. 



Next we use lemma 7 to derive a large deviation inequality for the random variable ||j4(X)||^, and the 
result is uniform in X G T ■ 
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Lemma 8. Let T Q j£ n i xn 2 kg a fixed subspace of m x ri2 real matrices with dim(7~) = d < m, 
S > 0, < q < 1. Let r/, e > be such that 2±fl < J. Then 



(1 - £)m^ g ||X||J, < < (1 + <5)mM 9 ||X||£, Vlef 

with probability exceeding 1 — (-)° P m>q {r]) , where 



rj 2 m, 



(46) 



(47) 



Proof: Let be the unit sphere in 7" with respect to ||-|| F , V be an e-net of ZY. Such net has cardinality 
at most (-) d , see chapter 13 of |11] or lemma 4.3 of [6] for details. By lemma 7, for any fixed X E V 



P{| ||^(X)||« - m/d > r?m^} < P m , q (r,) 



Taking the union, 



P{\J\\\A(X)\\\ 



i 3\ d 

> Tjmflg} < 2^ Pm,g{V) = {-) P m,q{V) 



€ I 



P{{\\\\A{X)\\\-m H \<^mii q }>\ 



xev 



3\d 



P m,q(ll) 



(48) 

(49) 
(50) 



Let X £ U. We can pick X E V such that \\X - X \\ F < e. Define e± = \\X - X \\ F , then ±(X-X ) E 
U and we can find X\ E V such that 

obtain sequences {e n } and {X n } C U such that \e n \ < e n , and 

N 

X — e nX n 



-^{X — Xq) — X\ < e. Continuing in the same fashion, we 

(51) 



n=0 



< e 



N+l 



where eo = 1- Hence 



\\A(X)\\l 



e n A(X n ) 



n=0 



< K\ q n^»)ii« < (i + y^I)^ mi 



(52) 



n=0 



Also, by the triangle inequality, 



\\A(X)\\l>\\A(X ) 



e n A(X n 



n=l 

oo 



> (I - rj)mfx q -J2\e n \ q \\A(X n ) 



n=l 



> 1 



r] + e q 
1 - ei 



mfj, q \\X\\ F 
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Therefore we have 



(1 - 5)mp q \\X\\ q F < \\A(X)\\ q < (1 + 5)mfi q \\X\\ q F \/X G U 



Now let / X G T be arbitrary, then , £ G U, then 



(1 - 5)mfj, q 
By the linearity of A, 



X 



\x\ 



< 



A 



X 



\x\ 



< (1 + 5)mp q 



X 



\x\ 



(1 - §)mn q \\X\\% < \\A(XW < (1 + 8)mn q \\X\\ F 



(53) 

(54) 

(55) 
□ 



Next, we will show that the result of lemma 8 is robust to small perturbations of the subspace T. 
Towards this end, we need to introduce the following distance measure between two subspaces 



p(Ti,T 2 ) ± \\P Tl - Pt 2 \\ , 



(56) 



where P-ji is the orthogonal projection associated with the (f-dimensional subspace T% of R niXn2 , and 
||-|| denotes the operator norm. 

The following lemma quantifies the change in the q-isometry constant when the subspace is slightly 
perturbed. 



Lemma 9. Let 71 and Tz be d-dimensional subspaces of 

1 



»rtixrt2 



Suppose that for all X G 71, 



{l-5)\\X\\ F <-\\A{X)\\ *<{1 + S)\\X\Vf 



for some constants r>0, < <5 < 1. Then for all Y G 7i 



ll-6')\\X\\ F <±\\A(X)\\l<(l + 5')\\X\\ F 



with 



Proof: For any Y G 7i, 



S' = 6 + [l + ™ 
r 



\p(TuT 2 y 



(57) 

(58) 
(59) 



\\A(Y)\\l < \\A(P ri (Y))\\ q q + \\A((Pn - Pr 2 )(Y))\\l 

<r(l + 6)\\P Tl (Y)\\ F + m\\A((P Tl -PT 2 )(Y)W 
< r(l + 5) \\Y\\ F + m\\A\\ q p{Ti,T 2 ) q \\Y\\ F 
= [r(l + 5) + m\\A\\ q p(T 1 ,T 2 ) q }\\Y\\ q F 
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Similarly, we have 



\\A(Y)\\l > \\A(P Tl (Y))\\ q g ~ H((Pn - P%)(X)W q 

> r(l - 8) \\P Tl (Y)\\% -m\\A\\ q p(ri,r 2 ) 9 \\Y\\ F 

> r(l - 5)[ \\Y\\% - \\(P Tl - P T2 ){Y)\\ q F ] - p(TuT 2 y \\Y\\% 

>r(i-s)[\\Y\\ F -\\p ri -PT 2 \\ q \\Y\\ F ] -m||^||V(ri,r 2 ri|y|^ 

> r [l_ S + - WAinpiT^y] \\Y\\ F 



□ 



In order to show that the above concentration result for fixed subspace of rank less than or equal to K 
is actually robust to the the perturbation of the subspace, we first need to parametrize the collection 
of all these subspaces so that the union, over the parameters, is the set of all matrices of rank less 
than or equal to K, and then approximate this parametrized collection by a finite subcollection. The 
degree of the approximation accuracy will be determined later. 

Let E C W 11 and F C W 12 be fixed subspaces of dimension K. Then the set of all nixri2 matrices whose 
column space is contained in E and row space is contained in F forms a _£T 2 -dimensional subspace of 
matrices of rank less than or equal to K. Denote this subspace by T,(E,F) C R n i xn 2 anc [ define 

Sni,n 2 ,x = {^(E,F) : E C R ni and F C R n2 are subspaces of dimension K) (60) 

The following lemma by Fazel et al [6] characterizes the number of subspaces needed to cover this set 
to an arbitrary resolution. 

Lemma 10. The cardinality K t of the e-net of T, ni)Tl2t K is bound by 

^ < f^J (61) 
where C$ is a constant independent of e, n\, n 2 and K. 

Finally, we need to quote one more concentration result of the norm of the random linear measurement 
operator A to help us establishing the main result of this section. See [6] and |12] for details. 

Lemma 11. Let A be as specified at the beginning of this section. Then 



{ 1 1^1 1 > l + V^^ + t ] <exp(- 7 mi 2 ) (62) 
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for some constant 7 > 0. 



We are now ready to proof the main theorem of this section. 
Proof: (of Theorem 8) 

Let M = {Ti{Ei, Fi)}f =1 be an e-net of S ni>n2j ^- for e > 0. By lemma 10, k < K e . For each pair of 
(Ei,Fi), define the set of matrices 

Bi = {X I 3E,F such that X £ E(E,F) and p(E(E, F), E(E h P;)) < e} (63) 

Also, let r], £ > be such that 

(64) 



v + ? < s 



l + p - 2 

Since J\f is an e-net, the union of all the Bi is equal to the set of all n± xri2 matrices of rank less than 
or equal to K. Hence we have 

P{5 K:r (A) <S} = P{(1 - 5)mfi g \\X\\ q F < \\A{X)\\ q q < (1 + 5)mfi g \\X\\ q F VX s.t. Rank(X) < K] 
>P{Vi [(l-(J)m/i g ||X|||< ||.A(X)||f,< (l + 8)mn q \\X\\% VXeBf]} (65) 

>P{Vz [(l-|)m/x g ||X|||< ||.4(X)|||,<(l + ^)mM 9 ||X|||, VIeS(4^i) (66) 



and (l + i, W |.y<| 

>i- £p{|||AW||;-mMJr| 



} 



i=l 



> 



Sm/jLq 



\x\\ q F } 



(67) 



mi^ll > 



2e<? 



Inequality (|66p follows directly from lemma 9 and inequality (j67|) is a consequence of the union bound. 
We then bound these two quantities separately. First by lemma 8 and lemma 10, we have 



> 



i=l 



2ei 



N }< 



2CQ\K{ni+n 2 -2K) f ^K 



(68) 



Second, by lemma 11, there exists a constant 7 > such that 



6 fig 



P{\\A\\> [-^-N] }<ex P | 



S N \ q 1 



nin 2 



??? 



(69) 
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Now we have to choose sufficiently small e so that this probability is less than exp(— C%m) for some 
C\ > 0. It is easy to see that if 

e «<(%) * (70) 

V m 

then 

i 

P{|L4||> ^-^ 9 }<exp(- 7 n 1 n 2 )<exp(- 7 m) (71) 

With this choice of e, 

± nm >_ _„)*, < ( 2l+ - c ° [(2 ^ +1) ' + " J V '" + "- 2g) (^v m „ w 

2 1+ iC„[(2 v ^2 + l)» + M ,]« 



exp i^(ni + n2 — 2-fT) In 



if 2 In I ^ ) +ln2 



3\ rfm 



U 2qc, 



2 



Since < nin2 for all m > 1, there exists a constant Co independent of m,ni,n,2 and -ff such that 

2 

the sum of the first three terms in the exponent is strictly less than T^zini + n-i)K ln(nin2). Hence 



4i 

1 



E^H^II > (|f - < exp (^(m + ^hW " (72) 

It follows that there exists a constant ci independent of m, rti,n2 and such that the restricted g- 
isometry constant 6ic,r(.<A) is less than or equal to 5, with probability greater than 1 — e Cim , whenever 
m > co(ni + n2)K\n{n\ri2)- □ 
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